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Abstract. A general notion of bootstrapped ^-divergence estimates constructed 
by exchangeably weighting sample is introduced. Asymptotic properties of these 
generalized bootstrapped ^-divergence estimates are obtained, by mean of the 
empirical process theory, which are applied to construct the bootstrap confidence 
set with asymptotically correct coverage probability. Some of practical problems 
are discussed, including in particular, the choice of escort parameter and several 
examples of divergences are investigated. Simulation results are provided to illus- 
trate the finite sample performance of the proposed estimators. 
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I. Introduction 

The ^-divergence modeling has proved to be a flexible tool and provided a power- 
ful statistical modeling framework in a variety of applied and theoretical contexts 
[refer to Broniatowski and Keziou (2009), Pardo (2006) and Liese and Vajda (2006, 
1987) and the references therein]. For good recent sources of references to research 
literature in this area along with statistical applications consult Basu et al. (2011) 
and Pardo (2006). Unfortunately, in general, the limiting distribution of the estima- 
tors, or their functionals, based on ^-divergences depend crucially on the unknown 
distribution, which is a serious problem in practice. To circumvent this matter, we 
shall propose, in this work, a general bootstrap of 0-divergence based estimators and 
study some of its properties by mean of a sophisticated empirical process techniques. 
A major application for an estimator is in the calculation of confidence intervals. 
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By far the most favored confidence interval is the standard confidence interval based 
on a normal or a Student t-distribution. Such standard intervals are useful tools, 
but they are based on an approximation that can be quite inaccurate in practice. 
Bootstrap procedures are an attractive alternative. One way to look at them is as 
procedures for handling data when one is not willing to make assumptions about the 
parameters of the populations from which one sampled. The most that one is willing 
to assume is that the data are a reasonable representation of the population from 
which they come. One then resamples from the data and draws inferences about 
the corresponding population and its parameters. The resulting confidence intervals 
have received the most theoretical study of any topic in the bootstrap analysis. 
Our main findings, which are analogous to that of Cheng and Huang (2010), are 
summarized as follows. The (^divergence estimator a. $(6) and the bootstrap <fi- 
divergence estimator ol*A0) are obtained by optimizing the objective function h(0, a.) 
based on the independent and identically distributed [i.i.d.] observations Xi, . . . , X n 
and the bootstrap sample X*, . . . , X* , respectively, 

1 - 

ol^O) : = argsup - ) h(0,ac,Xi), (1.1) 

i=i 

1 n 

ct;(0) := arg sup - V h(0, ct, X*), (1.2) 

where X*, . . . , X* are independent draws with replacement from the original sample. 
We shall mention that ot^(0) can alternatively be expressed as 

1 n 

a*J0) = argsup - VW ni /i(0, ct, X,), (1.3) 

where the bootstrap weights are given by 

(W n i, • • • , W nn ) ~ Multinomial (n; n _1 , . . . , rT 1 ). 

In this paper, we shall consider the more general exchangeable bootstrap weight- 
ing scheme that includes Efron's bootstrap [Efron (1979) and Efron and Tibshirani 
(1993)]. The general resampling scheme was first proposed in Rubin (1981) and ex- 
tensively studied by Bickel and Freedman (1981), who suggested the name "weighted 
bootstrap", e.g., Bayesian Bootstrap when (W n i, . . . , W nn ) = (-D nl , . . . , D nn ) is equal 
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in distribution to the vector of n spacings of n — 1 ordered uniform (0, 1) random 
variables, that is 

(Aa, • • • , Am) ~ Dirichlet(n; 1, . . . , 1). 
The interested reader may refer to Lo (1993). The case 

(Au, • • • , Dun) ~ Dirichlet(n; 4, . . . , 4) 

was considered in Weng (1989, Remark 2.3) and Zheng and Tu (1988, Remrak 5). 
The Bickel and Freedman result concerning the empirical process has been subse- 
quently generalized for empirical processes based on observations in IR d , d > 1 as 
well as in very general sample spaces and for various set and function-indexed ran- 
dom objects [see, for example Beran (1984), Beran and Millar (1986), Beran et al. 
(1987), Ganssler (1992), Lohse (1987)]. In this framework, Csorgo and Mason (1989) 
developed similar results for a variety of other statistical functions. This line of re- 
search was continued in the work of Gine and Zinn (1989, 1990). There is a huge 
literature on the application of the bootstrap methodology to nonparametric kernel 
density and regression estimation, among other statistical procedures, and it is not 
the purpose of this paper to survey this extensive literature. This being said, it is 
worthwhile mentioning that the bootstrap as per Efron's original formulation (see 
Efron (1979)) presents some drawbacks. Namely, some observations may be used 
more than once while others are not sampled at all. To overcome this difficulty, 
a more general formulation of the bootstrap has been devised: the weighted (or 
smooth) bootstrap, which has also been shown to be computationally more efficient 
in several applications. We may refer to Mason and Newton (1992), Praestgaard 
and Wellner (1993) and del Barrio and Matran (2000). Holmes and Reinert (2004) 
provided new proofs for many known results about the convergence in law of the 
bootstrap distribution to the true distribution of smooth statistics employing the 
techniques based on Stein's method for empirical processes. Note that other vari- 
ations of Efron's bootstrap are studied in Chatterjee and Bose (2005) using the 
term "generalized bootstrap" . The practical usefulness of the more general scheme 
is well-documented in the literature. For a survey of further results on weighted 
bootstrap the reader is referred to Barbe and Bertail (1995). 
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The remainder of this paper is organized as follows. In the forthcoming section 
we recall the estimation procedure based on (^divergences. The bootstrap of <fi- 
divergence estimators are introduced, in details, and their asymptotic properties 
are given in Section 3. In Section 4, we provide some examples explaining the 
computation of the 0-divergence estimators. In Section 5, we illustrate how to apply 
our results in the context of right censoring. Section 6 provides simulation results in 
order to illustrate the performance of the proposed estimators. To avoid interrupting 
the flow of the presentation, all mathematical developments are relegated to the 
Appendix. 

2. Dual divergence based estimates 

The class of dual divergence estimators has been recently introduced by Keziou 
(2003) and Broniatowski and Keziou (2009). Recall that the 0-divergence between a 
bounded signed measure Q, and a probability measure P on f^, when Q is absolutely 
continuous with respect to P, is defined by 

z, * (Q ' p): =//(^) dP - 

where </>(•) is a convex function from ] — oo,oo[ to [0, oo] with 0(1) = 0. We will 
consider only (^-divergences for which the function <j)(-) is strictly convex and satisfies: 
the domain of <f)(-), dom0 := {x G M. : <f)(x) < oo} is an interval with end points 

< 1 < 0(a</,) = lim 0(x) and ^(a^) = lim0(x). 

The Kullback-Leibler, modified Kullback-Leibler, x 2 , modified \ 2 an d Hellinger di- 
vergences are examples of (^-divergences; they are obtained respectively for <p(x) = 
a; log a; — x + 1, 4>(x) = —log a; + x — 1, <p(x) = \{x — l) 2 , <p(x) = \ and 
(p(x) = 2(yfx — l) 2 . The squared Le Cam distance (sometimes called the Vincze-Le 
Cam distance) and £i-error are obtained respectively for 

(f)(x) = {x- l) 2 /(2(z - 1)) and (f)(x) = \x - 1|. 

We extend the definition of these divergences on the whole space of all bounded 
signed measures via the extension of the definition of the corresponding 0(-) func- 
tions on the whole real space M as follows: when <$(■) is not well defined on IR_ or 
well defined but not convex on R, we set <f)(x) = +oo for all x < 0. Notice that for 
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the x 2 -divergence, the corresponding </>(•) function is defined on whole R and strictly 
convex. All the above examples are particular cases of the so-called "power diver- 
gences" , introduced by Cressie and Read (1984) (see also Liese and Vajda (1987, 
Chapter 2) and also the Renyi (1961)'s paper is to be mentioned here), which are 
defined through the class of convex real valued functions, for 7 in IR\ {0, 1}, 

x E M* + — >■ 7 (x) := s 7 -7s + 7-l (21) 

7(7-1) 

4>o(x) := — logs + x — 1 and <fii(x) := 2 log 2 — x + 1. (For all 7 e M, we define 
7 (O) := linx^o 7 (x)). So, the KL-divergence is associated to 0i, the KL m to O ; 
the x 2 to 02, the to 0_i and the Hellinger distance to 0i/2- In the monograph 
by Liese and Vajda (1987) the reader may find detailed ingredients of the modeling 
theory as well as surveys of the commonly used divergences. 

Let {Pg : 6 G 0} be some identifiable parametric model with © a compact subset of 
W. d . Consider the problem of estimation of the unknown true value of the parameter 
O on the basis of an i.i.d. sample X 1; . . . ,X n . We shall assume that the observed 
data are from the probability space (X,A, Pe ). Let </>(■) be a function of class C 2 , 
strictly convex such that 

dP e (x) < oo,V« e 0. (2.2) 

As it is mentioned in Broniatowski and Keziou (2009), if the function </>(■) satisfies 
the following conditions 

there exists < S < 1 such that for all c in [1 — 5, 1 + 5], 
we can find numbers c\, c 2 , c 3 such that (2.3) 
4>{cx) < ci<j)(x) + c%\x\ + C3, for all real x, 

then the assumption (2.2) is satisfied whenever D^iO^cx.) < 00, where D^(6,cx.) 
stands for the 0-divergence between P# and F a , refer to Broniatowski and Keziou 
(2006, Lemma 3.2). Also the real convex functions </>(•) (2.1), associated with the 
class of power divergences, all satisfy the condition (2.2), including all standard di- 
vergences. Under assumption (2.2), using Fenchel duality technique, the divergence 
D^(6, 6q) can be represented as resulting from an optimization procedure, this result 
was elegantly proved in Keziou (2003), Liese and Vajda (2006) and Broniatowski 
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and Keziou (2009). Broniatowski and Keziou (2006) called it the dual form of a di- 
vergence, due to its connection with convex analysis. According to Liese and Vajda 
(2006), under the strict convexity and the differentiability of the function </»(•), it 
holds 



<f>(t) ><f>(s)+<f>'(s){t-s), (2.4) 

where the equality holds only for s — t. Let and 0q be fixed and put t = 
dPg (x) /dPe (x) and s = dPe(x)/dP a (x) in (2.4) and then integrate with respect to 
Fg , to obtain 

D^6,6 ) := J <P (J^-j dPflo = sup J h(0,a) dP 0o , 
where h(0, a, •) : x 4 h(0, cx, x) and 

'dP e (x) ,, /dP (x)\ , /dP e (x) 



(2.5) 



(2.6) 



f,,f<^e\ ,™ rdPe(x) „ /dP*(x)\ , / 
We, a, x := 6' \ — - dP - — — - 
l ' ' J 7 V VdPj [dP Q (x) V VdP«(x)y' V V 

Furthermore, the supremum in this display (2.5) is unique and reached in cx = O , 
independently upon the value of 0. Naturally, a class of estimators of 0o, called 
"dual ^-divergence estimators" (D^DE's), is defined by 

c^(0) := arg sup F n h(0, cx), G 0, (2.7) 
where h(0,cx) is the function defined in (2.6) and, for a measurable function /(•), 

n 

P n /:=n- 1 ^/(X,,). 

The class of estimators a. $(6) satisfies 

F n ^-h(0,a^0)) = O. (2.8) 

Formula (2.7) defines a family of M-estimators indexed by the function </>(•) spec- 
ifying the divergence and by some instrumental value of the parameter 0. The 
^-divergence estimators are motivated by the fact that a suitable choice of the diver- 
gence may lead to an estimate more robust than the maximum likelihood estimator 
(MLE) one, see Jimenez and Shao (2001). Toma and Broniatowski (2010) studied 
the robustness of the D^DE's through the influence function approach, they treated 
numerous examples of location-scale models and give sufficient conditions for the 
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robustness of D0DE's. We recall that the maximum likelihood estimate belongs to 
the class of estimates (2.7). Indeed, it is obtained when <p(x) = — logx + x—1, that 
is as the dual modified i^L m -divergence estimate. Observe that (f>'(x) = — | + 1 and 
x(f)'{x) — 4>(x) = logx, hence 

J h(d,a)d¥ n = - /log (^) dP„. 

Keeping in mind definitions (2.7), we get 

«KL m (0) = argsup- /log (-=^-J dP n 



= argsup / log(dP a )dP n = MLE, 

a J 

independently upon 0. 

3. Asymptotic properties 

In this section, we shall establish the consistency of bootstrapping under general 
conditions in the framework of dual divergence estimation. Define, for a measurable 
function /(•), 

1 n 

F nf : =-E^( X <)' 

i=l 

where W n j's are the bootstrap weights defined on the probability space (W, Q, fw)- 
In view of (2.7), the bootstrap estimator can be rewritten as 

aj(0) :=argsupP;/i(0,a). (3.1) 
t*e© 

The definition of gl*AO), defined in (3.1), implies that 

p;^U(0,a;(0)) = o. (3.2) 

The bootstrap weights W^s are assumed to belong to the class of exchangeable 
bootstrap weights introduced in Praestgaard and Wellner (1993). In the sequel, 
the transpose of a vector x will be denoted by x T . We shall assume the following 
conditions. 

W.l The vector W n = (W n i, . . . , W nn ) T is exchangeable for all n = 1,2, . . ., i.e., 
for any permutation it = (71*1, . . . , ir n ) of (1, . . . , n), the joint distribution of 
7i(W n ) = (W n7T1 , . . . , W n7rn ) T is the same as that of W n . 
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W.2 W ri i > for all n, i and YH=i W n i = n for all n. 
W.3 limsup^^ ||W n i||2,i < C < oo, where 

POO 

||W n i|| 2) i= / sJV W {Wnl > U)du. 
J 

W.4 

lim lim sup sv,^t 2 ¥ w {W n i > t) = 0. 

A->oo n-toc t>\ 

W.5 (l/n)£r=i(WW-l) 2 c 2 > 0. 

In Efron's nonparametric bootstrap, the bootstrap sample is drawn from the non- 
parametric estimate of the true distribution, i.e., empirical distribution. Thus, it is 
easy to show that W n ~ Multinomial (n; n -1 , . . . , n -1 ) and conditions W.1-W.5 are 
satisfied. In general, conditions W.3-W.5 are easily satisfied under some moment 
conditions on W n i, see Praestgaard and Wellner (1993, Lemma 3.1). In addition 
to Efron's nonparametric boostrap, the sampling schemes that satisfy conditions 
W.1-W.5, include Bayesian bootstrap, Multiplier bootstrap, Double bootstrap, and 
Urn boostrap. This list is sufficiently long to indicate that conditions W.1-W.5, are 
not unduely restrictive. Notice that the value of c in W.5 is independent of n and 
depends on the resampling method, e.g., c = 1 for the nonparametric bootstrap and 
Bayesian bootstrap, and c = y/2 for the double bootstrap. A more precise discussion 
of this general formulation of the bootstrap can be found in Praestgaard and Wellner 
(1993), van der Vaart and Wellner (1996) and Kosorok (2008). 

There exist two sources of randomness for the bootstrapped quantity, i.e., a*Aff): 
the first comes from the observed data and the second is due to the resampling done 
by the bootstrap, i.e., random W ni 's. Therefore, in order to rigorously state our 
main theoretical results for the general bootstrap of 0-divergence estimates, we need 
to specify relevant probability spaces and define stochastic orders with respect to 
relevant probability measures. Following Cheng and Huang (2010) and Wellner and 
Zhan (1996), we shall view Xj as the 2-th coordinate projection from the canonical 
probability space A°°, P^) onto the i-th copy of X. For the joint randomness 
involved, the product probability space is defined as 

A°°, P£°) x (W,Q,F W ) = (X°° xW.rx Q, ¥%> x F w ). 
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Throughout the paper, we assume that the bootstrap weights W n iS are independent 
of the data Xj's, thus 

PxW = ^0 O x ^W- 

Given a real-valued function A n defined on the above product probability space, e.g. 
ou(0), we say that A n is of an order Op (1) in P eo -probability if, for any e, r\ > 0, 

as n — > 0, 

Fg {P^ lx (\A n \ > e) > V } — ► 0, (3.3) 

and that A n is of an order Op (1) in Pe -probability if, for any r\ > 0, there exists 
a < M < oo such that, as n — >■ 0, 

P 0o {P^ |x (|A n | > M) > 77} — ► 0, (3.4) 

where the superscript "o" denotes the outer probability, see van der Vaart and 
Wellner (1996) for more details on outer probability measures. For more details on 
stochastic orders, the interested reader may refer to Cheng and Huang (2010), in 
particular, Lemma 3 of the cited reference. 

To establish the consistency of ou(0), the following conditions are assumed in our 
analysis. 

(A.l) 

F 0o h(O,Oo)> sup P flo /i(0,a) (3.5) 
for any open set N(0 O ) C © containing O . 

(A.2) 

sup \F* n h(O,a)-F 0o h(O,a) \ F -H 0. (3.6) 

The following theorem gives the consistency of the bootstrapped estimates ct*J0). 

Theorem 3.1. Assume that conditions (A.l) and (A.2) hold. Suppose that condi- 
tions (A. 3-5) and W.1-W.5 hold. Then ol^{6) is a consistent estimate of . That 
is 

po 

°<*<j>{Q) ~ ~^ i n Fq - probability. 
The proof of Theorem 3.1 is postponed until §7. 

We need the following definitions, refer to van der Vaart (1998) and van der Vaart 



jv : J 7 ^ E 


| \v\ \jr = SUP 
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and Wellner (1996) among others. If J 7 is a class of functions for which, we have 
almost surely, 

||P n -P|| J r = SU P |P n /-P/| ->0, 

then we say that J 7 is a P-Glivenko-Cantelli class of functions. If J 7 is a class of 
functions for which 

G n = v^(Pn - P) G in e°°(F), 

where G is a mean-zero P-Brownian bridge process with (uniformly-) continuous 
sample paths with respect to the semi-metric pp(f,g), defined by 

pl(f,g) = Var P (f(X)-g(X)), 

then we say that J 7 is a P-Donsker class of functions. Here 

^(J 7 ) 

and G is a P-Brownian bridge process on J 7 if it is a mean-zero Gaussian process 
with covariance function 

E(G(f)G(g)) = Ffg-(Ff)(Fg). 

Remark 3.1. • Condition (A.l) is the "well separated" condition, compact- 
ness of the parameter space and the continuity of divergence imply that 
the optimum is well-separated, provided the parametric model is identified, 
see van der Vaart (1998, Theorem 5.7). 
• Condition (A. 2) holds if the class 

{h{0,a) : a G 0} 

is shown to be F-Glivenko-Cantelli, by applying van der Vaart and Wellner 
(1996, Lemma 3.6.16) and Cheng and Huang (2010, Lemma A.l). 

For any fixed S n > 0, define the class of functions % n and T-i n as 

H„:= j^U(0,«): \\<x-0 \\<8 n } (3.7) 

and 

Un'= <-^h(0,cx) : \\a-0 \\ < S n \ . (3.8) 
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We shall say a class of functions % G M(Fg ) if H possesses enough measurability 
for randomization with i.i.d. multipliers to be possible, i.e., P n can be random- 
ized, in other word, we can replace (5x; — Fe ) by (Wm — l)^Xv It is known that 
"H G M(Pfl ), e.g., if H is countable, or if {P n }^° are stochastically separable in H, 
or if % is image admissible Suslin; see Gine and Zinn (1990, pages 853 and 854). 
To state our result concerning the asymptotic normality, we shall assume the fol- 
lowing additional conditions. 

(A. 3) The matrices 

V:=¥e °ik h{0 > 0°)|^(0,0o) T 

and 

S:=-F 6o ^- 2 h(e,e ) 

are non singular. 
(A.4) The class U n G M(Fg ) n L 2 (F 0O ) and is P-Donsker. 
(A.5) The class U n G M{F 0O ) n L 2 (F 0Q ) and is P-Donsker. 

Conditions (A.4) and (A.5) ensure that the "size" of the function classes 7i n and 
7-L n are reasonable so that the bootstrapped empirical processes 



= \/n(F* - P, 



n J 



indexed, respectively by 7i n and K n , have a limiting process conditional on the orig- 
inal observations, we refer for instance to Prasstgaard and Wellner (1993, Theorem 
2.2). The main result to be proved here may now be stated precisely as follows. 

Theorem 3.2. Assume that ql${6) and ou(0) fullfil (2.8) and (3.2), respectively. 
In addition suppose that 

Pe PS 

a ^{6) — "V #o an d a^(#) — — >■ Oo in Fg -probability. 
Assume that conditions (A. 3-5) and W.l-W. 5 hold. Then we have 

\\a;(0) - 0o\\ = O^Jn- 1 ' 2 ) (3.9) 
in Fg -probability. Furthermore, 

n(a;(0) -a,(0)) = -S- 1 G* n ^h(0,0 o ) + o° Pw (l) (3.10) 
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in Fg -probability. Consequently, 

sup |p W |^((^/c)(a;(0)-a^)) <x)-p(^(o,e) <x)| = op (i), (3.11) 

xeK d 

where "<" is taken componentwise and "c" is given in W.5 ; whose value depends 
on the used sampling scheme, and 

s = s- l v(s- l ) r 

where S and V are given in condition (A. 3). Thus, we have 

sup |P W |* n ((v^c)(a;(0)-a*(0)) <x)-P <ro (Vn(a^)-0 o ) <x)| ^03.12) 
xeM d 

The proof of Theorem 3.1 is captured in the forthcoming §7. 

Remark 3.2. Note that an appropriate choice of the the bootstrap weights W n i 's 
implicates a smaller limit variance, that is, c 2 is smaller than 1. For instance, 
typical examples are i.i.d. -weighted bootstraps and the multivariate hypergeometric 
bootstrap, refer to PrcBstgaard and Wellner (1993, Examples 3.1 and 3.4). 

Following Cheng and Huang (2010), we shall illustrate how to apply our results 
to construct the confidence sets. A lower e-th quantile of bootstrap distribution is 
defined to be any g* e e M d fulfilling 

q* ne := inf{x : P^ n (a;(0) < x) > e}, 

where x is an infimum over the given set only if there does not exist a xi < x in M. d 
such that 

p^ ft (a;(0) < Xl ) >e. 

Keep in mind the assumed regularity conditions on the criterion function, that is, 
h(6, a) in the present framework, we can, without loss of generality, suppose that 

p*r|^(a;(0) <&) = £. 

Making use the distribution consistency result given in (3.12), we can approximate 
the e-th quantile of the distribution of 

(a^(0)-0 o ) by (q* ne - a,(0))/c. 
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Therefore, we define the percentile-type bootstrap confidence set as 
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C(e 



c^(0) + 



C(e/2) - &4>( ) 



aJO) H — — 



(3.13) 



In a similar manner, the e-th quantile of y/n(a<f,(0) — Oq) can be approximated by 
g* e , where g* e is the e-th quantile of the hybrid quantity (y/n/c)(aZ(6) — cx^iO)), 
i.e., 

F wlXn ((^/c)(z;(6) - a,(0)) < = e. 

Note that 

& = (Vn/c)(q* nt -a^6)). 
Thus, the hybrid-type bootstrap confidence set would be defined as follows 



C(e) :- 



ccAO) 



9n(l-e/2) ~ f a\ q k^) 



n 



?? 



(3.14) 



Note that g* e and g* e are not unique by the fact that we assume 6 is a vector. Recall 
that, for any x G 



inc 
pd 



P Oo (v^(c^(0) -0 O ) < x) 



where 



p^„((v^/c)(s;(0)-s (0))<x) ^( x) 



tf(x) = P(JV(0,£) < x). 



According to the quantile convergence Theorem, i.e., van der Vaart (1998, Lemma 
21.1), we have, almost surely, 



fi* Vxw i VTr-l 



When applying quantile convergence theorem, we use the almost sure representa- 
tion, that is, van der Vaart (1998, Theorem 2.19), and argue along subsequences. 
Considering the Slutsky's Theorem which ensures that 



y/n[a^{0) - ) - q^ (e/2) weakly converges to JV(0, S) - * _1 (e/2), 
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we further have 

P^fflo<a^)-^) = V xw (*fn{cc^0)-0 )>it n{e/2) ) 

^¥ xw (iV(0,£) > V~\e/2)) 
= 1 - e/2. 

The above arguments prove the consistency of the hybrid-type bootstrap confidence 
set, i.e., (3.16), and can also be applied to the percentile-type bootstrap confidence 
set, i.e., (3.15). For an in-depth study and more rigorous proof, we may refer to 
van der Vaart (1998, Lemma 23.3). The above discussion may be summarized as 
follows. 

Corollary 3.3. Under the conditions in Theorem 3.2, we have, as n — > oo, 

Fw (am + < 6o < am + ^m~^ _^ , _ e> 

(3-15) 

v xw (am - %^ <e < am - ^p) — ^ i - e . (s.ie) 

It is well known that the above bootstrap confidence sets can be obtained easily 
through routine bootstrap sampling. 

Remark 3.3. Notice that the choice of weights depends on the problem at hand : 
accuracy of the estimation of the entire distribution of the statistic, accuracy of a 
confidence interval, accuracy in large deviation sense, accuracy for a finite sample 
size, we may refer to James (1997) and the references therein for more details. 
Barbe and Bertail (1995) indicate that the area where the weighted bootstrap clearly 
performs better than the classical bootstrap is in term of coverage accuracy. 

3.1. On the choice of the escort parameter. The very peculiar choice of the 
escort parameter defined through 6 = 6 has same limit properties as the MLE one. 
The D0DE (Oq), in this case, has variance which indeed coincides with the MLE 
one, see for instance Keziou (2003, Theorem 2.2, (1) (b)). This result is of some rel- 
evance, since it leaves open the choice of the divergence, while keeping good asymp- 
totic properties. For data generated from the distribution 7V(0, 1), Figure 1 shows 
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that the global maximum of the empirical criterion F n h (o n , a^j is zero, indepen- 
dently of the value of the escort parameter n (the sample mean X = n~ x *YTi=\ 
in Figure 1(a) and the median in Figure 1(b)) for all the considered divergences 
which is in agreement with the result of Broniatowski (2011, Theorem 6), where it 
is showed that all differentiable divergences produce the same estimator of the pa- 
rameter on any regular exponential family, in particular the normal models, which 
is the MLE one, provided that the conditions (2.3) and D^iO^cx) < oo are satisfied. 




FIGURE 1. Criterion for the normal location model. 

Unlike the case of data without contamination, the choice of the escort parameter is 
crucial in the estimation method in the presence of outliers. We plot in Figure 2 the 
empirical criterion F n h ^0 n , on , where the data are generated from the distribution 

(1 - e)M{0 o , 1) + e5 w , 

where e = 0.1, 0q = and 8 X stands for the Dirac measure at x. Under contam- 
ination, when we take the empirical "mean", n = X, as the value of the escort 
parameter 0, Figure 2(a) shows how the global maximum of the empirical criterion 
F n h [0 n , ot J shifts from zero to the contamination point. In Figure 2(b), the choice 
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FIGURE 2. Criterion for the normal location model under contamination. 



of the "median" as escort parameter value leads to the position of the global maxi- 
mum remains close to a = 0, for Hellinger (7 = 0.5), \ 2 {l = 2) and i^L-divergence 
(7 = 1), while the criterion associated to the i^L m -divergence (7 = 0, the maximum 
is the MLE) stills affected by the presence of outliers. 

In practice, the consequence is that if the data are subject to contamination the 
escort parameter should be chosen as a robust estimator of 6$, say 6 n . For more 
details about the performances of dual ^-divergence estimators for normal density 
models, we refer to Cherfi (2011b). 



4. Examples 

Keep in mind the definitions (2.5) and (2.6). In what follows, for easy refer- 
ence and completeness, we give some usual examples of divergences, discussed in 
Bouzebda and Keziou (2010a,b), of divergences and the associated estimates, we 
may refer also to Broniatowski and Vajda (2009) for more examples and details. 
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• Our first example is the Kullback-Leibler divergence 

4>(x) = x\ogx — x + l 
4>'(x) = logx 
x<f/(x) — 4>{x) = x — 1. 
The estimate of D KL (0, O ) is given by 

o M) = supf/.o.g)^-/^-:)^} 

and the estimate of the parameter O , with escort parameter 0, is defined as 
follows 

a KL (0) :=argsap{yiog(^) dP, - 

• The second one is the x 2 -divergence 

(j)\x) = x-1 
x(f)'(x) - <p(x) = 



6 l)dp r 



dP a 



The estimate of D x 2(0, O ) is given by 



and the estimate of the parameter 0$, with escort parameter 0, is defined by 

3{/®- 1 )*-5/((S , - 1 ) dP - 

• An other example is the Hellinger divergence 

= 2(v^-l) 2 
0'(z) =2-4= 
x<j>'(x)-<j>(x) = 2y/x-2. 
The estimate of D^{0, 0q) is given by 
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and the estimate of the parameter 6 , with escort parameter 6, is defined by 
c£h(0) := arg sup 

• All the above examples are particular cases of the so-called "power diver- 
gences" , which are defined through the class of convex real valued functions, 
for 7 in E\{0, 1}, 

x e r; -+ ^(x) -.= x7 ~/ x+ 7~ 1 . 

717 - 1) 

The estimate of -D 7 (#, 6 ) is given by 



D,(e,e ) = su P { I — I (^] -i]dP fl 



7-1 



t*e© 



7 - 1 v v dp . 




7 VV dp c 

and the parameter estimate is defined by 



(4.1) 



a 7 (0) (4.2) 




7-1 \ /•]_// JTO . \ 7 



Remark 4.1. T7ie computation of the estimate a. $(6) requires calculus of the in- 
tegral in the formula (2.6). This integral can be explicitly calculated for the most 
standard parametric models. Below, we give a closed-form expression for Normal, 
log-Normal, Exponential, Gamma, Weilbull and Pareto density models. Hence, the 
computation of ol^(0) can be performed by any standard non linear optimization 
code. Unfortunately, the explicit formula of 0.^(6), generally, can not be derived, 
which also is the case for the ML method. In practical problems, to obtain the esti- 
mate di^iO), one can use the Newton-Raphson algorithm taking as initial point the 
escort parameter 6. This algorithm, is a powerful technique for solving equations 
numerically, performs well since the the objective functions ot e 1— y Fe o h(0, ex.) are 
concave and the estimated parameter is unique for functions ex e h-> F n h(6, a), 
for instance, refer to Broniatowski and Keziou (2009, Remark 3.5). 
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4.1. Example of normal density. Consider the case of power divergences and 
the normal model 

{N (0, a 2 ) :(fl,(r 2 )G0 = Kxi;}. 

Set 



^ ) = ^ exp rK^) 



2 



Simple calculus gives, for 7 in R\{0, 1}, 
1 f fdFe^MV- 1 



7-1 J \d¥^ 2 (x) 



,-(7-1) _7 
2 

= exp 



7 " 1 - (7 - I 2 (7^2 - (7 - l)^) 

This yields to 

^((0,0X1,(00, 0-0)) 



_ f 7 ( 7 -l)(0-q) 2 

« U / 2 \7- V 7 o-|-( 7 -l)o-? P 1 2(7^ -(7-l)o-?) 

7 f „, / /V /1\ 2 /V 2 



1 



<r 2 y 7 fXi-ey (Xi-a 

exp 



7 n ~i V <Ti / I 2 V V 47 1 / V <T2 / / j 7(7 ~~ 1) 

In the particular case, P = Af(6, 1), it follows that, for 7 e K \ {0, 1}, 

S 7 (0,0 O ) := sup / h(0,a)dP n 

f 1 f 7 (7-l)(0-c*) 2 l 

= S « P \7^I eXP l 2 J 

1 n 1 1 

Vexp(-^(0-a)(0 + a-2X i )l \. 

in^ P l 2 l 7(7-1)/ 

For 7 = 0, 

D K L m (0, 0o) := sup / /i (0, a) dP n 

= sup|^^(0-a)(0 + a-2X,)J, 
which lead to the maximum likelihood estimate independently upon 0. 
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For 7 = 1, 

D KL (0,0 ) := sup f h(e,ot)d¥ n 

a J 

j-i(0-«) 2 -if>xpj-i(0-«) (0 + «-2X,)} + lj 



sup 

a 



4.2. Example of log-normal density. Consider the case of power divergences 
and the log-normal model 

VeA*) = ^7^ ex P 1 4 ( lQg(:r J~ ) 2 [:(^- 2 )ee=Mx R* + ,x > 

Simple calculus gives, for 7 in K\{0, 1}, 
1 f fdFe^y- 1 



7-U VdP^V < n W*)< fa 

1 CT r (7_1) ^ r 7(7-i)(0-«) 



7-1 - (7 - 1)^! ^ I 2 (7*l - (7 - 

This yields to 

B 7 ((0,<n),(0 o ,<ro)) 



2\ 



f 7(7-l)(^-a) 2 

£ \ 7 " 1 vVi - (7 - ^ 1 2(7*1 - (7 - l)<r?) 



7 / /"log(Xi) - ©V ^log(X t )-a 
1 eX p 111 . 



7n ^-f \<7i / ^ \ V °"i / V ^2 

1=1 x ' 1 s • 



7(7 - 1) J ' 



4.3. Example of exponential density. Consider the case of power divergences 
and the exponential model 

[p e ( x ) = 0exp(-6x) :0g9 = R;}. 

We have, for 7 in R\{0, 1}, 

T-lJKdWj dF " W<te l»7(7-l)- o(7-l) 
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Then using this last equality, one finds 

1 n / f) \ 7 

l n ~t\ a J 7(7-1) J 

In more general case, we may consider the gamma density combined with the power 
divergence. The Gamma model is defined by 

p e (x;k) :=^' l 6 y :M>0 
where T(-) is the Gamma function 

POD 

r(jfe) := / x k - x exp(-x)dx. 
Jo 

Simple calculus gives, for 7 in K\{0, 1}, 

7-1 / q \ fc(7-i) / ^ \ fe 1 



7-u wp^w; w v^7-«(7-i)y 7-1' 

which implies that 

L> 7 (0,0 O ) = sup <( ( — ) 



« y — 01(7 — 1)/ 7 — 1 

1 n f a \ fe 7 
-— £ - exp{-7((0X,)-(aX,))}- 



7(7 - 1) 



4.4. Example of Weibull density. Consider the case of power divergences and 
the Weibull density model, with the assumption that k e is known and is the 
parameter of interest to be estimated, recall that 

{ Mx) = \ (i) fclexp (- (I)*) : 6 e = R +> x - °} • 

Routine algebra gives, for 7 in R\{0, 1}, 
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which implies that 
S 7 (0,0 O ) = sup<| (- 



e) 



/r-(fr (7-1)/ 7-i 



n 



exp < —7 



7 n ^ ' I I V / V a / / J 7(7 ~~ 1) 

4.5. Example of the Pareto density. Consider the case of power divergences 
and the Pareto density 

Q 

Po(x) ■= ■ x >1; e 
Simple calculus gives, for 7 in R\{0, 1}, 

' d¥ e (x) dx= [-) — ? — . (4.4) 



7-lJ \d¥ a (x) J " v ' \aj \9l(l- 1) -0(7- l) 2 

As before, using this last equality, one finds 

^ \ (7-1) / ^ 

_D 7 (0,0 O ) = sup 



a / \^7(7 — 1) — 01 (7 ~~ I) 2 



n 







X 



{-7(«-«)} 



^ n ~i \ a J ' 7(7-1) 



For 7 = 0, 

D KLin (O,0 o ) : = sup f h(0,a)dF ri 

a J 



mo 



= sup 4— V log - -(0-a)log(X i 



which lead to the maximum likelihood estimate, given by 

^Elog(X, 

independently upon 6. 



n . 
i=i 



Remark 4.2. The choice of divergence, i.e., the statistical criterion, depends cru- 
tially on the problem at hand. For example, the x 2 -divergence among various di- 
vergences in the nonstandard problem (e.g., boundary problem estiamtion) is more 
appropriate. The idea is to include the parameter domain into an enlarged space, 
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say e , in order to render the boundary value an interior point of the new parame- 
ter space, e . Indeed, Kullback-Leibler, modified Kullback-Leibler, modified x 2 , and 
Hellinger divergences are infinite when dQ/dP takes negative values on non neg- 
ligible (with respect to F) subset of the support ofF, since the corresponding (/>(•) 
is infinite on (— oo,0), when 6 belongs to @ e \@. This problem does not hold in 
the case of \ 2 -divergence, in fact, the corresponding <$(■) is finite on M, for more 
details refer to Bouzebda and Keziou (2008, 2010a,b), consult also Broniatowski 
and Keziou (2009) and Broniatowski and Leorato (2006) for related matter. It is 
well known that when the underlying model is misspecified or when the data are 
contaminated the maximum likelihood or other classical parametric methods may be 
severely affected and lead to very poor results. Therefore, robust methods, which au- 
tomatically circumvent the contamination effects and model misspecification, can be 
used to provide a compromise between efficient classical parametric methods and the 
semi-parametric approach provided they are reasonably efficient at the model, this 
problem has been investigated in Basu et al. (1998, 2006). In Bouzebda and Keziou 
(2010a, b), simulation results show that the choice of ^-divergence has good prop- 
erties in terms of efficiency-robustness. We mention that some progress has been 
made on automatic data-based selection of the tuning parameter a > 0, appearing 
in formula (1) of Basu et al. (2006), the interested reader is referred to Hong and 
Kim (2001) and Warwick and Jones (2005). It is mentioned in Tsukahara (2005), 
where semiparametric minimum distance estimators are considered, that the MLE 
or inversion-type estimators involve solving a nonlinear equation which depends on 
some initial value. The second difficulty is that the objective function is not convex 
in 0, in general, which give the situation of multiple roots. Thus in general, "good" 
consistent initial estimate are necessary and the Dcf)DE should serve that purpose. 



5. Random right censoring 

Let T = Ti, . . . ,T n be i.i.d. survival times with continuous survival function 1 — 
Fg (-) = 1 — Fg Q (T < •) and C\, . . . ,C n be independent censoring times with d.f. 
G(-). In the censoring set-up, we observe only the pair Yj = min (Tj,Cj) and 5i = 
t{Ti < Ci}, where !{•} is the indicator function of the event {•}, which designs 
whether an observation has been censored or not. Let (Y"i, 5i), . . . , (Y n , S n ) denote 
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the observed data points and 

t(l) < t(2) < ••• < t(k) 

be the k distinct death times. Now define the death set and risk set as follows, for 
j = 1, . . . , k, 

D(j) : ={i:y i = t(j),6 i = l} 

and 

R(j) ■= {'■■<), > /(./)}• 

The Kaplan and Meier (1958)'s estimator of 1 — Fg (-), denoted here by 1 — F n (-), 
may be written as follows 

k / sr i \ ^ T U)^ 



l-U) :=n i-^t 

• =1 \ Z^geR(i) 1 _ 



One may define a generally exchangeable weighted bootstrap scheme for the Kaplan- 
Meier estimator and related functionals as follows, cf. James (1997, p. 1598), 



l-K(t) :=II 1 



j=1 \ Y.qtR{j) W nq 

Let if) be F0 o -integrable and put 

/, 

(?) 



* n := / ^(«)dP;(«) = ^T jn ^(T (j 
7 i=i 



where 



T / ^ggg(j) ) TT ( Y. q eD{k) W nq 

ljn : ~ 1 V IV 11 



^2qGR(j) Wnq J fc=1 ^ J2qeR(k) ^nq ^ 

Note that we have used the following identity. Let a*, z = 1, . . . , k, bi, i = 1, . . . , k, 
be real numbers 

k k k i—1 k 

JJ o» - JJ 6< = - 6<) JJ 6j JJ Oft. 

i=l i=l i=l j=l h=l+i 

In the similar way, we define a more appropriate representation, that will be used 
in the sequel, as follows 



/n 
iP(u)d¥* n (u) =J2^(Y j:n ), 

3=1 
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where, for 1 < j < n, 
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7T 



T, q& R(j) W nq J k 



n 



q&D{k) Wnq 



1 \ J2q & R(k) Wnq 



Here, Y\. n < • • • < Y n:n are ordered F-values and 5i :n denotes the concomitant 
associated with Y i:n . Hence we may write 



P 



3=1 



(5.1) 



For the right censoring situation, the bootstrap D0DE's, is defined by replacing P n 
in (2.7) by P*, that is 

a n {0) : = arg sup I h(0,a)dF* n , 0e@. (5.2) 

ae& J 

The corresponding estimating equation for the unknown parameter is then given by 

d 



da. 



fc(0,a)dff* =0, 



(5.3) 



where we recall that 
h(6, a, x) : = 



dP - 



dP g (x) 
dP„fa;: 



Formula (5.2) defines a family of M-estimator for censored data. In the case of the 
power divergences family (2.1), it follows that from (4.1) 



h(0,a)dF n 



7-1 



7-1 



dP fl 



7 



dP fi 



1 



7-1 



where 



and, for 1 < j < n, 



■>y.n 



n-j + 



in 

i=l 



n — % 



n — i + 1 



Consider the lifetime distribution to be the one parameter exponential exp (0) 
with density 6e~ 0x , x > 0. Following Stute (1995), the Kaplan-Meier integral 
J h(6, a:)dP n may be written as 

n 

22ujj n h(e,a,Y j:n ). 

3=1 
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The MLE of 6q is given by 



e 



n.MLE 



and the approximate MLE (AMLE) of Oakes (1986) is defined by 



AMLE 



We infer from (4.3), that, for 7 e R \ {0, 1}, 



(5.4) 



(5.5) 



h(0, a)dP n 



( T -l)[ T 0+(l_ T )a] 

0X7 



For 7 = 0, 



/i(6»,a)dP n = ^a; in 

i=i 



en 



exp{-7(0 - a)Y j:n } - 1 



(0 - ct)Y J:n - hi 



e 



a 



Observe that this divergence leads to the AMLE, independently upon the value of 

e. 

For 7 = 1, 



/ 



/i(0,c*)dP n = log 



e 



a 



e 



i=l 



e 



a 



exp(-(0-c*)Y} :n )-l 



For more details about dual 0-divergence estimators in right censoring we refer to 
Cherfi (2011a), we leave this study open for future research. We mention that the 
bootstrapped estimators, in this framework, are obtained by replacing the weights 
Uj n by 7ij n in the preceding formulas. 

6. Simulations 

In this section, series of experiments were conducted in order to examine the per- 
formance of the proposed random weighted bootstrap procedure of the D^DE's, 
defined in (3.1). We provide numerical illustrations regarding the mean squared 
error (MSE) and the coverage probabilities. The computing program codes were 
implemented in R. 

The values of 7 are chosen to be —1, 0, 0.5, 1, 2, which corresponds, as indicated 
above, to the well known standard divergences: Xm _ divergence, KL m , the Hellinger 
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distance, KL and the x 2_ divergence respectively. The samples of sizes considered 
in our simulations are 25, 50, 75, 100, 150, 200 and the estimates, D^DE's a^(0), 
are obtained from 500 independent runs. The value of escort parameter 9 is taken 
to be the MLE, which, under the model, is a consistent estimate of 8q, and the limit 
distribution of the D0DE a^Oo), i n this case, has variance which indeed coincides 
with the MLE, for more details on this subject, we refer to Keziou (2003, Theorem 
2.2, (1) (b)), as it is mentioned in Section 3.1. The bootstrap weights are chosen to 
be 

(W nl , . . . , W nn ) ~ Dirichlet(n; !,...,!). 



n = 100 n = 100 




Figure 3. Densities of the estimates. 

In Figure 3, we plot the densities of the different estimates, it shows that the pro- 
posed estimators perform reasonably well. 

Tables 1 and 2 provide the MSE of various estimates under the Normal model 
N(8q = 0,1). Here, we mention that the KL based estimator (7 = 1) is more 
efficient than the others competitors. 

Tables 3 and 4 provide the MSE of various estimates under the Exponential model 
exp(# = 1). As expected, the MLE produces most efficient estimators. A close 
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look at the results of the simulations show that the D0DE's perform well under the 
model. For large sample size n = 200, the estimator based on the Hellinger distance 
is equivalent to that of the MLE. Indeed in terms of empirical MSE the D0DE with 
7 = 0.5 produces the same MSE as the MLE, while the performance of the other 
estimators is comparable. 

Table 1. MSE of the estimates for the Normal distribution, B=500 



n = 25 n = 50 n = 75 n = 100 n = 150 n = 200 



7 
-1 


0.0687 


0.0419 


0.0288 


0.0210 


0.0135 


0.0107 





0.0647 


0.0373 


0.0255 


0.0192 


0.0127 


0.0101 


0.5 


0.0668 


0.0379 


0.0257 


0.0194 


0.0128 


0.0101 


1 


0.0419 


0.0217 


0.0143 


0.0108 


0.0070 


0.0057 


2 


0.0931 


0.0514 


0.0331 


0.0238 


0.0148 


0.0112 



Table 2. MSE of the estimates for the Normal distribution, B=1000 



n = 25 n = 50 n = 75 n = 100 n = 150 n = 200 



7 
-1 


0.0716 


0.0432 


0.0285 


0.0224 


0.0147 


0.0099 





0.0670 


0.0385 


0.0255 


0.0202 


0.0136 


0.0093 


0.5 


0.0684 


0.0391 


0.0258 


0.0203 


0.0137 


0.0093 


1 


0.0441 


0.0230 


0.0143 


0.0116 


0.0078 


0.0049 


2 


0.0900 


0.0522 


0.0335 


0.0246 


0.0156 


0.0103 



Tables 5, 6, 7 and 8, provide the empirical coverage probabilities of the correspond- 
ing 0.95 weighted bootstrap confidence intervals based on B = 500, 1000 weighted 
bootstrap estimators. Notice that the empirical coverage probabilities as in any 
other inferential context, the greater the sample size, the better. From the results 
reported in these tables, we find that for large values of the sample size n, the em- 
pirical coverage probabilities are all close to the nominal level. One can see that 
the D(/>DE with 7 = 2 has the best empirical coverage probability which is near the 
assigned nominal level. 
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Table 3. MSE of the estimates for the Exponential distribution, B=500 



n = 25 n = 50 n = 75 n = 100 n = 150 n = 200 



7 



-1 


0.0729 


0.0435 


0.0313 


0.0215 


0.0146 


0.0117 





0.0708 


0.0405 


0.0280 


0.0195 


0.0131 


0.0104 


0.5 


0.0727 


0.0415 


0.0282 


0.0197 


0.0131 


0.0105 


1 


0.0786 


0.0446 


0.0296 


0.0207 


0.0136 


0.0108 


2 


0.1109 


0.0664 


0.0424 


0.0289 


0.0178 


0.0132 



Table 4. MSE of the estimates for the Exponential distribution, B=1000 



n = 25 n = 50 n = 75 n = 100 n = 150 n = 200 



7 
-1 


0.0670 


0.0444 


0.0295 


0.0243 


0.0146 


0.0111 





0.0659 


0.0417 


0.0269 


0.0216 


0.0133 


0.0102 


0.5 


0.0677 


0.0427 


0.0272 


0.0216 


0.0135 


0.0102 


1 


0.0735 


0.0458 


0.0287 


0.0225 


0.0140 


0.0106 


2 


0.1074 


0.0697 


0.0429 


0.0306 


0.0183 


0.0133 



Table 5. Empirical coverage probabilities for the Normal distribu- 
tion, B=500 





n = 25 


n = 50 


n = 75 


n = 100 


n = 150 


n = 200 


7 














-1 


0.88 


0.91 


0.93 


0.92 


0.95 


0.92 





0.91 


0.92 


0.94 


0.94 


0.94 


0.93 


0.5 


0.94 


0.94 


0.94 


0.96 


0.94 


0.93 


1 


0.44 


0.47 


0.54 


0.46 


0.48 


0.51 


2 


0.97 


0.97 


0.96 


0.97 


0.95 


0.95 



6.1. Right censoring case. This subsection presents some simulations for right 
censoring case discussed in §5. A sample is generated from exp(l) and an expo- 
nential censoring scheme is used, the censoring distribution is taken to be exp(l/9), 
that the proportion of censoring is 10%. To study the robustness properties of our 
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Table 6. Empirical coverage probabilities for the Normal distribu- 
tion, B=1000 





n = 25 


n = 50 


n = 75 


n = 100 


n = 150 


n = 200 


7 














-1 


0.87 


0.90 


0.93 


0.92 


0.93 


0.96 





0.91 


0.94 


0.94 


0.93 


0.94 


0.96 


0.5 


0.93 


0.93 


0.95 


0.93 


0.94 


0.96 


1 


0.46 


0.45 


0.48 


0.46 


0.45 


0.50 


2 


0.96 


0.97 


0.96 


0.95 


0.96 


0.96 



Table 7. Empirical coverage probabilities for the Exponential distri- 
bution, B=500 





n = 25 


n = 50 


n = 75 


n = 100 


n = 150 


n = 200 


7 














-1 


0.67 


0.83 


0.87 


0.91 


0.93 


0.92 





0.73 


0.87 


0.91 


0.93 


0.96 


0.93 


0.5 


0.76 


0.88 


0.91 


0.94 


0.96 


0.93 


1 


0.76 


0.88 


0.90 


0.95 


0.97 


0.93 


2 


0.76 


0.89 


0.91 


0.96 


0.96 


0.94 



Table 8. Empirical coverage probabilities for the Exponential distri- 
bution, B=1000 





n = 25 


n = 50 


n = 75 


n = 100 


n = 150 


n = 200 


7 














-1 


0.70 


0.79 


0.90 


0.91 


0.92 


0.91 





0.76 


0.84 


0.91 


0.92 


0.93 


0.92 


0.5 


0.78 


0.85 


0.93 


0.94 


0.94 


0.93 


1 


0.78 


0.87 


0.94 


0.94 


0.95 


0.94 


2 


0.78 


0.88 


0.95 


0.95 


0.96 


0.95 



estimators 20% of the observations are contaminated by exp(5). The D0DE's 
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are calculated for samples of sizes 25, 50, 100, 150 and the hole procedure is re- 
peated 500 times. We can see from Table 9 that the D^DE's perform well under 

Table 9. MSE of the estimates for the Exponential distribution un- 
der right censoring 





n = 25 


n = 50 


n = 100 


n = 150 


7 










-1 


0.1088 


0.0877 


0.0706 


0.0563 





0.1060 


0.0843 


0.0679 


0.0538 


0.5 


0.1080 


0.0860 


0.0689 


0.0544 


1 


0.1150 


0.0914 


0.0724 


0.0567 


2 


0.1535 


0.1276 


0.1019 


0.0787 



the model in term of MSE, and are an attractive alternative to the AMLE. 

Table 10. Empirical coverage probabilities for the Exponential dis- 
tribution under right censoring 





n = 25 


n = 50 


n = 100 


n = 150 


7 










-1 


0.55 


0.63 


0.63 


0.64 





0.59 


0.66 


0.64 


0.64 


0.5 


0.61 


0.66 


0.64 


0.65 


1 


0.63 


0.67 


0.66 


0.66 


2 


0.64 


0.70 


0.68 


0.67 



Table 10 shows the variation in coverage of nominal 95% asymptotic confidence 
intervals according to the sample size. There clearly is under coverage of the confi- 
dence intervals, the D0DE's have poor coverage probabilities due to the censoring 
effect. However for small and moderate sized samples the D0DE's associated to 
7 = 2 outperforms the AMLE. 

Under contamination the performances of our estimators decrease considerably. 
Such findings are evidences for the need of more adequate procedures for right 
censored data. 
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Table 11. MSE of the estimates for the Exponential distribution 
under right censoring, 20% of contamination 





n = 25 


n = 50 


n = 100 


n = 150 


7 










-1 


0.1448 


0.1510 


0.1561 


0.1591 





0.1482 


0.1436 


0.1409 


0.1405 


0.5 


0.1457 


0.1402 


0.1360 


0.1342 


1 


0.1462 


0.1389 


0.1332 


0.1300 


2 


0.1572 


0.1442 


0.1338 


0.1266 



Table 12. Empirical coverage probabilities for the Exponential dis- 
tribution under right censoring, 20% of contamination 





n = 25 


n = 50 


n = 100 


n = 150 


7 










-1 


0.44 


0.49 


0.54 


0.57 





0.46 


0.49 


0.53 


0.57 


0.5 


0.46 


0.49 


0.53 


0.57 


1 


0.45 


0.49 


0.53 


0.57 


2 


0.45 


0.49 


0.52 


0.53 



Remark 6.1. In order to extract methodological recommendations for the use of an 
appropriate divergence, it will be interesting to conduct an extensive Monte Carlo 
experiments for several divergences or investigate theoretically the problem of the 
choice of the divergence which leads to an "optimal" (in some sense) estimate in 
terms of efficiency and robustness, which would go well beyond the scope of the 
present paper. An other challenging task is how to choose the bootstrap weights for 
a given divergence in order to obtain, for example, an efficient estimator. 

7. Appendix 



This section is devoted to the proofs of our results. The previously defined notation 
continues to be used below. 
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7.1. Proof of Theorem 3.1. Proceeding as van der Vaart and Wellner (1996) in 
their proof of the Argmax theorem, i.e., Corollary 3.2.3, it is straightforward to show 
the consistency of the bootstrapped estimates ou(0). 

□ 

Remark 7.1. Note that the proof techniques of Theorem 3.2 are largely inspired 
from that of Cheng and Huang (2010) and changes have been made in order to 
adapt them to our purpose. 

7.2. Proof of Theorem 3.2. Keep in mind the following definitions 



n(F n -Fe ] 



and 



n(P;-P n ). 



d 

In view of the fact that fe T{ — h(0, 0q) = 0, then a little calculation shows that 

OCX 



r n -^h(e,0 o ) + G n -^h(0,0 o ) 

d 



d 



d 



—h(o,c*;(0))-—h(0,6 o) 



o 



()a W0o)-^h(0,a;(0)) 



-fG„ —h(0,0 o )-—h(0,a;(0)) 
+V^K^h(0,a;(0)). 
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Consequently, we have following inequality 



^-h(e,a;(e))-^-h(e,e ) 



< 



d_ 
'do. 



h(0,0 o 



d_ 

OCX. 



h(e,e Q 



+ 
+ 
+ 



d 



G\ + G2 + G3 + G4 + G5 



(7.1) 



According to Theorem 2.2 in Praestgaard and Wellner (1993), under condition (A. 4), 
we have G\ = Op w (l) in Pg -probability. In view of the CLT, we have G2 = Op g (1). 
By applying a Taylor series expansion, we have 



d 2 



(7.2) 



where a is between a*^{0) and O . By condition (A. 5) and Theorem 2.2 in Praest- 
gaard and Wellner (1993), we conclude that the right term in (7.2) is of order 
Op w — O ||) in P 0Q -probability. The fact that ct*^(6) is assumed to be con- 

sistent, then, we have G3 = o P (1) in Fq -probability. An analogous argument 
yields 

is of order 0^ g (\\a.*J0) — \\), by the consistency of a*J0), we have G4 = o P (1) 



l 4> 

in P# -probability. Finally, G5 
rewritten as follows 



based on (3.2). In summary, (7.1) can be 



< 0° Pw (l) + 0° PgQ (l) (7.3) 
in Pe -probability. On the other hand, by a Taylor series expansion, we can write 



da 



h(0,a) 



d_ 
da 



h(0,0 c 



-(a - O ) T S + O (\\a 



0o|| 2 ) • 



(7.4) 
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Clearly it is straightforward to combine (7.4) with (7.3), to infer the following 



\s\\a;(0) - O || || < o° Fw (i) + o° eo (i) + o° w (V^K(0) -e \\ 2 ) (7.5) 



in Fg -probability. By considering again the consistency of ol*AO) and condition 
(A. 3) and making use (7.5) to complete the proof of (3.9). 
We next prove (3.10). Introduce 



H 2 :-- 
H 4 :-- 



±h(e,ai(e))-£.h(e,e ) 
uf: ^-h(o, a;(e)) - ^F n ^-h(6, a^o)). 



By some algebra, we obtain 



V / j= i 

Obviously, Hi = Op ;v (n -1 / 2 ) in Fg -probability and H 2 = Op g (n -1 / 2 ). We also 
know that the order of H 3 is 0^ w {n^ 1 ^ 2 ) in ¥g -probability. Using (2.8) and (3.2) 
we obtain that H 4 = 0. 
Therefore, we have established 



d_ 
dot 



h(e,e ) + O]Pg (i) 



(7.6) 



in P 0o -probability. To analyze the left hand side of (7.6), we rewrite it as 



nP 0n 



h{e,a 4> {0))-—h{6,e o) 

OCX OCX 
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By a Taylor expansion, we obtain 

V^s(&;(6) - a,(0)) 

+O ¥go (n-V 2 ) + 0° ¥w {n- l/2 ) 

= G* n -^h(0,0 )+o Peo (l)+4 w (l) (7.7) 

in Pe -probability. Keep in mind that, under condition (A. 3), the matrix S is non- 
singular. Multiply both sides of (7.7) by S^ 1 to obtain (3.10). An application of 
Praestgaard and Wellner (1993, Lemma 4.6), under the bootstrap weight conditions, 
thus implies (3.11). Using Broniatowski and Keziou (2009, Theorem 3.2) and van der 
Vaart (1998, Lemma 2.11), it easily follows that 

sup |P 0o (^(c^(0)-0o) <x)-P(JV(0,E) <x)| =op 9o (1). (7.8) 

xeK d 

By combining (3.11) and (7.8), we readily obtain the desired conclusion (3.12). 

□ 
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